Tagging Portuguese With A Spanish Tagger
نویسندگان
چکیده
We describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese.1 We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an annotated corpus of Peninsular Spanish, a language related to Portuguese, (ii) an unannotated corpus of Portuguese, (iii) a description of Portuguese morphology on the level of a basic grammar book. We extend the similar work that we have done (Hana et al., 2004; Feldman et al., 2006) by proposing an alternative algorithm for cognate transfer that effectively projects the Spanish emission probabilities into Portuguese. Our experiments use minimal new human effort and show 21% error reduction over even emissions on a fine-grained tagset.
منابع مشابه
Tagging Portuguese with a Spanish Tagger Using Cognates
We describe a knowledge and resource light system for an automatic morphological analysis and tagging of Brazilian Portuguese.1 We avoid the use of labor intensive resources; particularly, large annotated corpora and lexicons. Instead, we use (i) an annotated corpus of Peninsular Spanish, a language related to Portuguese, (ii) an unannotated corpus of Portuguese, (iii) a description of Portugue...
متن کاملMining for unambiguous instances to adapt part-of-speech taggers to new domains
We present a simple, yet effective approach to adapt part-of-speech (POS) taggers to new domains. Our approach only requires a dictionary and large amounts of unlabeled target data. The idea is to use the dictionary to mine the unlabeled target data for unambiguous word sequences, thus effectively collecting labeled target data. We add the mined instances to available labeled newswire data to t...
متن کاملWANN-Tagger - A Weightless Artificial Neural Network Tagger for the Portuguese Language
Weightless Artificial Neural Networks have proved to be a promising paradigm for classification tasks. This work introduces the WANN-Tagger, which makes use of weightless artificial neural networks for labelling Portuguese sentences, tagging each of its terms with its respective part-of-speech. A first experimental evaluation using the CETENFolha corpus indicates the usefulness of this paradigm...
متن کاملPart-of-Speech Tagging for English-Spanish Code-Switched Text
Code-switching is an interesting linguistic phenomenon commonly observed in highly bilingual communities. It consists of mixing languages in the same conversational event. This paper presents results on Part-of-Speech tagging Spanish-English code-switched discourse. We explore different approaches to exploit existing resources for both languages that range from simple heuristics, to language id...
متن کامل